2 research outputs found

    Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start

    Full text link
    Every day, thousands of users sign up as new Wikipedia contributors. Once joined, these users have to decide which articles to contribute to, which users to seek out and learn from or collaborate with, etc. Any such task is a hard and potentially frustrating one given the sheer size of Wikipedia. Supporting newcomers in their first steps by recommending articles they would enjoy editing or editors they would enjoy collaborating with is thus a promising route toward converting them into long-term contributors. Standard recommender systems, however, rely on users' histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem. The present paper addresses the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We assess the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines. By addressing the cold-start problem, this work can help with the sustainable growth and maintenance of Wikipedia's diverse editor community.Comment: Accepted at the 13th International AAAI Conference on Web and Social Media (ICWSM-2019

    Citations with identifiers in Wikipedia

    No full text
    <p>This dataset includes a list of citations with identifiers extracted from the most recent version of Wikipedia across all language editions. The data was parsed from the Wikipedia content dumps published on March 1, 2018.</p> <p><strong>License</strong><br></p> <p>All files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/</p> <p><strong>Projects</strong><br></p> <p>Previous versions of this dataset ("Scholarly citations in Wikipedia") were limited to the English language edition. The current version includes one dataset for each of the 298 languages editions that Wikipedia supports as of March 2018. Projects are identified by their ISO 639-1/639-2 language code, per https://meta.wikimedia.org/wiki/List_of_Wikipedias.</p> <p><strong>Identifiers</strong><br></p> <p>• PubMed IDs (pmid) and PubMedCentral IDs (pmcid).<br>• Digital Object Identifiers (doi)</p><p>• International Standard Book Number (isbn)</p><p>• ArXiv Ids (arxiv)</p> <p><strong>Format</strong><br></p> <p>Each row in the dataset represents a citation as a (Wikipedia article, cited source) pair. Metadata about when the citation was first added is included.</p> <p>• page_id -- The identifier of the Wikipedia article (int), e.g. <em>1325125<br>• </em>page_title -- The title of the Wikipedia article (utf-8), e.g.<em> Club cell<br>• </em>rev_id -- The Wikipedia revision where the citation was first added (int), e.g.<em> 282470030<br>• </em>timestamp -- The timestamp of the revision where the citation was first added. (ISO 8601 datetime), e.g.<em> 2009-04-08T01:52:20Z<br>• </em>type -- The type of identifier, e.g.<em> pmid<br>• </em>id -- The id of the cited source (utf-8), e.g.<em> 18179694</em></p> <p><strong>Source code</strong><br></p> <p>https://github.com/halfak/Extract-scholarly-article-citations-from-Wikipedia (MIT Licensed)</p> <p>A copy of this dataset is also available at https://analytics.wikimedia.org/datasets/archive/public-datasets/all/mwrefs/</p><p><strong>Notes</strong><br></p> <p>Citation identifers are extracted as-is from Wikipedia article content. Our spot-checking suggests that 98% of identifiers resolve.</p
    corecore